-
Notifications
You must be signed in to change notification settings - Fork 15k
[OpenMP][Clang] Use ATTACH
map-type for list-items with base-pointers.
#153683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[OpenMP][Clang] Use ATTACH
map-type for list-items with base-pointers.
#153683
Conversation
For the following: ```c int *p; \#pragma omp target map(p[0]) // (A) (void)p; \#pragma omp target map(p) // (B) (void)p; \#pragma omp target map(p, p[0]) // (C) (void)p; \#pragma omp target map(p[0], p) // (D) (void)p; ``` For (A), the pointer `p` is predetermined `firstprivate`, so it should be (and is) captured by-copy. However, for (B), (C), and (D), since `p` is already listed in a `map` clause, it's not predetermined `firstprivate`, and hence, should be captured by-reference, like any other mapped variable. To ensure the correct handling of (C) and (D), the following changes were made: 1. In SemaOpenMP, we now ensure that `p` is marked to be captured by-reference in these cases. 2. We no longer ignore `map(p)` during codegen of `target` constructs, even if there's another map like `map(p[0])` that would have been mapped using a PTR_AND_OBJ map. 3. For cases like (D), we now handle `map(p)` before `map(p[0])`, so the former gets the TARGET_PARAM flag and sets the kernel argument.
… before their derefs.
The output of the compile-and-run tests is incorrect. These will be used for reference in future commits that resolve the issues. Also updated the existing clang LIT test, target_map_both_pointer_pointee_codegen.cpp, with more regions and more narrowed-down update_cc_test_checks filters.
…mapped-ptrs-by-ref
…ion-using-attach-maptype
…ion-using-attach-maptype
This patch introduces libomptarget support for the ATTACH map-type, which can be used to implement OpenMP conditional compliant pointer attachment, based on whether the pointer/pointee is newly mapped on a given construct. For example, for the following: ```c int *p; #pragma omp target enter data map(p[1:10]) ``` The following maps can be emitted by clang: ``` (A) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM &p, &p[1], sizeof(p), ATTACH ``` Without this map-type, the two possible maps emitted by clang: ``` (B) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM (C) &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ ```` (B) does not perform any pointer attachment, while (C) also maps the pointer p, which are both incorrect. In terms of implementation, maps with the ATTACH map-type are handled after all other maps have been processed, as it requires knowledge of which new allocations happened as part of the construct. As per OpenMP 5.0, an attachment should happen only when either the pointer or the pointee was newly mapped while handling the construct. Maps with ATTACH map-type-bit do not increase/decrease the ref-count. With OpenMP 6.1, `attach(always/never)` can be used to force/prevent attachment. For `attach(always)`, the compiler will insert the ALWAYS map-type, which would let libomptarget bypass the check about one of the pointer/pointee being new. With `attach(never)`, the ATTACH map will not be emitted at all. The size argument of the ATTACH map-type can specify values greater than `sizeof(void*)` which can be used to support pointer attachment on Fortran descriptors. Note that this also requires shadow-pointer tracking to also support them. That has not been implemented in this patch. This was worked upon in coordination with Ravi Narayanaswamy, who has since retired. Happy retirement, Ravi!
This patch introduces libomptarget support for the ATTACH map-type, which can be used to implement OpenMP conditional compliant pointer attachment, based on whether the pointer/pointee is newly mapped on a given construct. For example, for the following: ```c int *p; #pragma omp target enter data map(p[1:10]) ``` The following maps can be emitted by clang: ``` (A) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM &p, &p[1], sizeof(p), ATTACH ``` Without this map-type, the two possible maps emitted by clang: ``` (B) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM (C) &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ ```` (B) does not perform any pointer attachment, while (C) also maps the pointer p, which are both incorrect. In terms of implementation, maps with the ATTACH map-type are handled after all other maps have been processed, as it requires knowledge of which new allocations happened as part of the construct. As per OpenMP 5.0, an attachment should happen only when either the pointer or the pointee was newly mapped while handling the construct. Maps with ATTACH map-type-bit do not increase/decrease the ref-count. With OpenMP 6.1, `attach(always/never)` can be used to force/prevent attachment. For `attach(always)`, the compiler will insert the ALWAYS map-type, which would let libomptarget bypass the check about one of the pointer/pointee being new. With `attach(never)`, the ATTACH map will not be emitted at all. The size argument of the ATTACH map-type can specify values greater than `sizeof(void*)` which can be used to support pointer attachment on Fortran descriptors. Note that this also requires shadow-pointer tracking to also support them. That has not been implemented in this patch. This was worked upon in coordination with Ravi Narayanaswamy, who has since retired. Happy retirement, Ravi!
ATTACH
map-type for list-items with base-pointers.
…base-ptrs. These have been pulled out of the codegen PR llvm#153683, to reduce the size of that PR.
I've spent some time looking through these patches. First off, I see that the test case from #141042 is partially fixed -- it works as written in the issue, but not if the commented-out
This gives mappings like this:
I think entry 2 is bogus, particularly the "size 1" bit. At a high level, this patch series tackles the problem of parsing complex expressions by keying off the attach-pointer component, whereas my (admittedly less developed!) patch series adjusts the partial-struct (overlap) handling code to group parts of the expression by "containing struct" base. That's similar but not quite the same, and I'm not sure how to reconcile the two approaches, or even if that's necessary -- but if it's not, then I'm not sure how to handle cases such as the above. (This one, I mean: #153672) When I worked on this problem for GCC, I ended up creating a kind of tiny DSL just to parse expressions for mapping, described briefly in this patch submission: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627897.html For LLVM, I don't think we should use exactly the same approach, but I was thinking of something conceptually similar, probably along the lines of an iterator that walks over each "chunk" of an expression, with members to query things like attachment points, containing struct bases, innermost and outermost component accesses, or the "leaf"-type of the chunk (array section expr, array shaping expr, and so on). Incrementing the iterator would then (I think) move to the next indirection level, or the next containing-struct base, with details of course TBD. That would allow the rather complex and hard-to-follow (existing) logic in (That's not the same as the |
Also improves one debug dump regarding pointer-attachment.
Thanks for trying the patch. The commented out code from #141042 is expected to work, but as discussed here, the original is not expected to work as per OpenMP. Since there is no map with v as either the list-item, or the base-pointer of a list-item, the implicit data-mapping rule for
Did you try an older version of the patch? With the current version, I get the following:
Aside from grouping containing structures, another reason for grouping component-lists starting with the same variable, by the complexity of their attach-pointers, is that the kernel argument is determined based on whether the pointer is mapped, or its pointee (see #145454, or line clang/lib/CodeGen/CGOpenMPRuntime.cpp:9749 in this PR). Do you think there are any cases that we won't be able to handle with this approach? With this PR, we are able to reuse the existing logic for determining the partial struct, since the grouping of component-lists happens at a higher level. It might still be desirable to update the partial-struct handling if it makes things easier to read/follow. At the same time, it can continue to assume that it's only working with a set of component-lists that will either have no overlap, or belong to the same containing struct. |
It's not impossible... I think that was with hash 087945a (map-ptr-array-section-using-attach-maptype branch). (Edit: yes, user-error on my part here, apologies!)
I'll go back at look at that in more detail, thank you.
I'm not sure yet.
I'm still working to better understand the original code, and your patch set, so please bear with me! General questions:
|
Here's a test case that might need more complex handling for overlapped mappings:
It works (with this patchset) with NO_OVERLAPS defined, but not with it commented out as shown. |
Thanks for the test! It helped identify the need to use the appropriate pointer-type rather than
From OpenMP's perspective, there can only be one ATTACH operation per list-item. If a list-item is a reference variable that's a member of a struct, like: int x;
int p;
struct S {
int &xr = x;
int &pr = p;
};
A user can map both together, like
variable is actually used in the target region, for correctness?) For After the captured vars have been handled, the remaining map clauses are handled in
The attach handling should not affect handling of overlapping members etc, which is handled by the PartialStruct processing. Each attach-ptr-group (list of component-lists with the same attach-ptr) has its own |
Some new test cases added here: #156703 |
Regarding the
(No overlaps. The whole of
(
(A combination of both the above.
This is the slightly odd case perhaps -- we have no more target parameter (expected), but now we map the whole of JFYI, HTH. |
Thanks for uploading the tests and trying the changes!
The reason for this is that currently (independent of this PR), the
The function that handles mapping on non-captured variables, and on other constructs like |
This PR adds several new tests for mapping of chained structures, i.e. those resembling: #pragma omp target map(tofrom: a->b->c) These are currently XFAILed, although the first two tests actually work with unified memory -- I'm not sure if it's possible to easily improve the condition on the XFAILs in question to make them more accurate. These cases are all fixed by the WIP PR llvm#153683.
This PR adds several new tests for mapping of chained structures, i.e. those resembling: #pragma omp target map(tofrom: a->b->c) These are currently XFAILed, although the first two tests actually work with unified memory -- I'm not sure if it's possible to easily improve the condition on the XFAILs in question to make them more accurate. These cases are all fixed by the WIP PR #153683.
…ucts (#156703) This PR adds several new tests for mapping of chained structures, i.e. those resembling: #pragma omp target map(tofrom: a->b->c) These are currently XFAILed, although the first two tests actually work with unified memory -- I'm not sure if it's possible to easily improve the condition on the XFAILs in question to make them more accurate. These cases are all fixed by the WIP PR llvm/llvm-project#153683.
}; | ||
|
||
GenerateInfoForComponentLists(DeclComponentLists, | ||
/*IsEligibleForTargetParamFlag=*/true); | ||
// Group component lists by their AttachPtrExpr and process them in order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtb20, this comment lists the two reasons for grouping the component-lists based on the attach-ptrs.
IsValidBase = true; | ||
|
||
if (!IsValidBase) { | ||
SemaRef.Diag(ELoc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -2172,18 +2173,22 @@ bool SemaOpenMP::isOpenMPCapturedByRef(const ValueDecl *D, unsigned Level, | |||
// - For pointers mapped by value that have either an implicit map or an | |||
// array section, the runtime library may pass the NULL value to the | |||
// device instead of the value passed to it by the compiler. | |||
// - If both a pointer an a dereference of it are mapped, then the pointer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #145454 for chages/tests specific to the change in variable capturing. If preferred, the change can be merged before this, but that would lead to some regressions.
/// TODO: Handle cases for target-update, where the list-item is a | ||
/// non-contiguous array-section that still has a base-pointer. | ||
static std::pair<const Expr *, std::optional<size_t>> | ||
findAttachPtrExpr(MappableExprComponentListRef Components, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The helper/utility functions can be reviewed separately as part of this PR, if that is preferable: #155625.
// &p, &p[1], 24*sizeof(float), TARGET_PARAM | TO | FROM | PTR_AND_OBJ | ||
// in unified shared memory mode or for local pointers | ||
// p, &p[1], 24*sizeof(float), TARGET_PARAM | TO | FROM | ||
// p, &p[1], 24*sizeof(float), TARGET_PARAM | TO | FROM // map pointee |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// is same as the use_device_ptr operand. e.g. | ||
// map expr | use_device_ptr expr | current behavior | ||
// ---------|---------------------|----------------- | ||
// p[1] | p | match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See clang/test/OpenMP/target_data_use_device_ptr_codegen.cpp for tests for when we match and when we don't match against an existing map. We also have compile+run tests (for non-USM cases), that are passing with this PR. See offload/test/mapping/use_device_ptr.
// map expr | use_device_addr expr | current | possible restrictive/ | ||
// | | behavior | safer behavior | ||
// ---------|----------------------|-----------|----------------------- | ||
// p | p | match | match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See clang/test/OpenMP/target_data_use_device_addr_codegen.cpp for codegen tests, and offload/test/mapping/use_device_addr for compile+run tests (that now pass with this PR).
// &(ps->ps->ps), &(ps->ps->ps->ps), sizeof(S2*), PTR_AND_OBJ | TO | FROM | ||
// ps, &ps[0], 0, TARGET_PARAM | IMPLICIT // (+) | ||
// &(ps->ps->ps[0]), &(ps->ps->ps->ps), sizeof(S2*), FROM | ||
// &(ps->ps->ps), &(ps->ps->ps->ps), sizeof(void*), ATTACH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the tests at the bottom of clang/test/OpenMP/target_data_codegen.cpp for examples for deeply nested member-of expressions.
} | ||
|
||
// Append any pending zero-length pointers which are struct members and | ||
// used with use_device_ptr or use_device_addr. | ||
// FIXME: This is now redundant as we are not populating DeferredInfo | ||
// anymore. Remove unless we find a legitimate need of populating | ||
// using DefferedInfo during the review process. | ||
auto CI = DeferredInfo.find(Data.first); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexey-bataev, do you remember any specific case which required us to use DeferredInfo? With the current codegen, it's not being populated, and all use_device_ptr/addr unit tests are passing.
#pragma omp target map(alloc : f[index][:]) | ||
{ | ||
f[index][2] += 4; | ||
} | ||
|
||
// &f[0], &f[0], 0, PARAM | IMPLICIT | ||
// &f[index][0], &f[index][index+1], (index+1)*4 < 4? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddpagan, it's not related to this PR, but do you know why we are not unconditionally using 0 as the size for assumed-size maps with pointer bases? For example line 14 uses 4, while line 21 uses 0. And here, we are trying to compute some size at runtime.
This adds support for using
ATTACH
map-type for proper pointer-attachment when mapping list-items that have base-pointers.For example, for the following:
The following maps are now emitted by clang:
Previously, the two possible maps emitted by clang were:
(B) does not perform any pointer attachment, while (C) also maps the
pointer p, both of which are incorrect.
With this change, we are using ATTACH-style maps, like
(A)
, for cases where the expression has a base-pointer. For example:Why a large PR
While it's unfortunate that this PR has gotten large and difficult to review, the issue is that all the functional changes have to be made together, to prevent regressions from partially implemented changes.
For example, the changes to capturing were previously done separately (#145454), but they would still cause stability issues in absence of full attach-mapping. And attach-mapping needs those changes to be able to launch kernels.
We can extract the utilities and functions, like those for finding attach-ptrs, or comparing exprs, out as a separate NFC PR that doesn't call those functions, just adds them (#155625). And maybe the change that adds a new error message for use_device_addr on array-sections with non-var base-pointers can be extracted out too (but that will have to be a follow-up change in that case, and we may get comp-fails with this PR when the erroneous case is not caught/diagnosed).
Grouping of maps based on attach base-pointers
We also group mapping of clauses with the same base decl in the order of the increasing complexity of their base-pointers, e.g. for something like:
We first map
spp
, thenspp[0]
thenspp[0][0]
andspp[0][0].a
.This allows us to also group "struct" allocation based on their attach pointers. This resolves the issues of us always mapping everything from the beginning of the symbol
spp
. Each group is mapped independently, and at the same level, likespp[0][0]
and its memberspp[0][0].a
, we still get map them together as part of the same contiguous structspp[0][0]
. This resolves issue #141042.use_device_ptr/addr fixes
The handling of
use_device_ptr/addr
was updated to use the attach-ptr information, and works for many cases that were failing before. It has to be done as part of this series because otherwise, the switch from ptr_to_obj to attach-style mapping would have caused regressions in existing use_device_ptr/addr tests.Handling of attach-pointers that are members of implicitly mapped structs:
p
below, is a base-pointer in amap
clause on a target construct (likemap(p[0:1])
, and the base of that struct is either thethis
pointer (implicitly or explicitly), or a struct that is implicitly mapped on that construct, we add an implicitmap(p)
so that we don't implicitly map the full struct.Scope for improvement:
findAttachPtrExpr
is fairly simple, and fast.Needs future work:
ref_ptr/ref_ptee
, andattach
map-type-modifier on them.